Conversation
📝 WalkthroughWalkthroughThe pull request introduces a new dbt finance model that analyzes revenue metrics for products with an "Excellent" rating tier (rating_tier = 'Excellent'). It updates the core dimension schema with data type specifications and new column, then adds corresponding comprehensive schema documentation with governance controls and data quality tests for the new cohort model. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
dbt/models/marts/finance/finance_excellent_cohort_revenue.sql (1)
41-47: Optional: eliminate the duplicate window computation via an extra CTE layer.
sum(product_total_revenue) over ()is evaluated twice in the sameSELECT(once forcohort_total_revenue, once insidepct_of_cohort_revenue). Most optimizers deduplicate this, but factoring it into a separate CTE makes the intent explicit and removes any optimizer dependency.♻️ Proposed refactor
-final as ( +cohort as ( select product_id, product_name, product_category, avg_rating, rating_tier, review_count, positive_reviews, negative_reviews, unit_price, unit_margin, margin_pct, total_units_sold, product_total_revenue, product_gross_profit, - sum(product_total_revenue) over () as cohort_total_revenue, - round( - product_total_revenue / nullif(sum(product_total_revenue) over (), 0) * 100, - 2 - ) as pct_of_cohort_revenue + sum(product_total_revenue) over () as cohort_total_revenue from excellent_products -), +), + +final as ( + select + *, + round( + product_total_revenue / nullif(cohort_total_revenue, 0) * 100, + 2 + ) as pct_of_cohort_revenue + from cohort +)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbt/models/marts/finance/finance_excellent_cohort_revenue.sql` around lines 41 - 47, Extract the duplicated window sum into an intermediate CTE so the expression sum(product_total_revenue) over () is computed once; in practice, add a CTE (e.g., excellent_products_with_total) that selects product_total_revenue and computes cohort_total_revenue as sum(product_total_revenue) over (), then in the final SELECT derive pct_of_cohort_revenue = round(product_total_revenue / nullif(cohort_total_revenue, 0) * 100, 2); update references to use excellent_products_with_total (or similarly named CTE) and keep column names product_total_revenue, cohort_total_revenue, pct_of_cohort_revenue the same.dbt/models/marts/core/schema.yml (1)
91-97: Addnot_nullandaccepted_valuestests torating_tier.The column drives the cohort filter in
finance_excellent_cohort_revenue(WHERE rating_tier = 'Excellent'). Without data-quality guards, a NULL value (possible when avg_rating is NULL from the left join with reviews) or a misspelled tier silently produces an empty or corrupt cohort. The description already enumerates the four valid tiers — makingaccepted_valuesstraightforward. Business rules that require columns to use a valid set are a natural fit for dbt'saccepted_valuestest.🛡️ Proposed tests for
rating_tier- name: rating_tier description: > Business-logic classification of product quality derived from avg_rating. Tiers are defined as: Excellent (avg_rating >= 4.5), Good (avg_rating >= 3.5), Average (avg_rating >= 2.5), and Poor (avg_rating < 2.5). Used as the cohort key for rating-based revenue and margin analysis (e.g. finance_excellent_cohort_revenue). data_type: varchar + tests: + - not_null + - accepted_values: + values: ['Excellent', 'Good', 'Average', 'Poor']🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbt/models/marts/core/schema.yml` around lines 91 - 97, Add dbt tests for the rating_tier column to prevent NULLs and invalid tier values: in the schema.yml entry for the model that defines rating_tier, add a not_null test for rating_tier and an accepted_values test listing the four allowed strings ("Excellent", "Good", "Average", "Poor"); this ensures cohorts like finance_excellent_cohort_revenue (which filters WHERE rating_tier = 'Excellent') won't silently fail due to NULLs or misspellings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dbt/models/marts/finance/schema.yml`:
- Around line 250-256: The pct_of_cohort_revenue field in the schema.yml is
missing a not_null test which allows NULLs when cohort_total_revenue is zero;
add the - not_null test under the pct_of_cohort_revenue column entry so dbt will
fail the build if the metric is NULL (this mirrors the existing not_null on
cohort_total_revenue and ensures the calculated column pct_of_cohort_revenue is
validated).
- Around line 161-165: The business_logic test "pct_of_cohort_revenue sums to
100" is using a fixed 0.01 tolerance which is too tight because
pct_of_cohort_revenue is rounded to 2 decimals and rounding error accumulates
with N products; update the test SQL to use a safe tolerance (either a practical
constant like 0.5 or a calculated maximum such as COUNT(DISTINCT
product_id)/200) instead of 0.01 — edit the test named "pct_of_cohort_revenue
sums to 100" in schema.yml and change the WHERE clause so it compares
ABS(SUM(pct_of_cohort_revenue) - 100) against the new threshold (e.g., > 0.5 or
> (COUNT(DISTINCT product_id)/200) computed from {{ model }}).
---
Nitpick comments:
In `@dbt/models/marts/core/schema.yml`:
- Around line 91-97: Add dbt tests for the rating_tier column to prevent NULLs
and invalid tier values: in the schema.yml entry for the model that defines
rating_tier, add a not_null test for rating_tier and an accepted_values test
listing the four allowed strings ("Excellent", "Good", "Average", "Poor"); this
ensures cohorts like finance_excellent_cohort_revenue (which filters WHERE
rating_tier = 'Excellent') won't silently fail due to NULLs or misspellings.
In `@dbt/models/marts/finance/finance_excellent_cohort_revenue.sql`:
- Around line 41-47: Extract the duplicated window sum into an intermediate CTE
so the expression sum(product_total_revenue) over () is computed once; in
practice, add a CTE (e.g., excellent_products_with_total) that selects
product_total_revenue and computes cohort_total_revenue as
sum(product_total_revenue) over (), then in the final SELECT derive
pct_of_cohort_revenue = round(product_total_revenue /
nullif(cohort_total_revenue, 0) * 100, 2); update references to use
excellent_products_with_total (or similarly named CTE) and keep column names
product_total_revenue, cohort_total_revenue, pct_of_cohort_revenue the same.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
dbt/models/marts/core/schema.ymldbt/models/marts/finance/finance_excellent_cohort_revenue.sqldbt/models/marts/finance/schema.yml
| - name: "pct_of_cohort_revenue sums to 100" | ||
| type: "business_logic" | ||
| sql: "SELECT * FROM (SELECT ABS(SUM(pct_of_cohort_revenue) - 100) AS diff FROM {{ model }}) t WHERE diff > 0.01" | ||
| expected_result: "0 rows" | ||
| description: "Validates that pct_of_cohort_revenue values sum to 100% across all products, within a 0.01 rounding tolerance." |
There was a problem hiding this comment.
pct_of_cohort_revenue sum tolerance of 0.01 will cause systematic false test failures.
pct_of_cohort_revenue is rounded to 2 decimal places. The maximum rounding error per product is ±0.005, so the accumulated error across N products is at most N × 0.005. With just 7 equal-revenue products the sum is already 100.03 (diff = 0.03 > 0.01), triggering a test failure. Any realistic "Excellent" cohort with non-trivially-divisible revenue shares will exceed the 0.01 threshold.
The tolerance should be at least N_products / 200 (i.e., the theoretical maximum accumulation), or a practical constant like 0.5:
🐛 Proposed fix
- sql: "SELECT * FROM (SELECT ABS(SUM(pct_of_cohort_revenue) - 100) AS diff FROM {{ model }}) t WHERE diff > 0.01"
+ sql: "SELECT * FROM (SELECT ABS(SUM(pct_of_cohort_revenue) - 100) AS diff FROM {{ model }}) t WHERE diff > 0.5"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbt/models/marts/finance/schema.yml` around lines 161 - 165, The
business_logic test "pct_of_cohort_revenue sums to 100" is using a fixed 0.01
tolerance which is too tight because pct_of_cohort_revenue is rounded to 2
decimals and rounding error accumulates with N products; update the test SQL to
use a safe tolerance (either a practical constant like 0.5 or a calculated
maximum such as COUNT(DISTINCT product_id)/200) instead of 0.01 — edit the test
named "pct_of_cohort_revenue sums to 100" in schema.yml and change the WHERE
clause so it compares ABS(SUM(pct_of_cohort_revenue) - 100) against the new
threshold (e.g., > 0.5 or > (COUNT(DISTINCT product_id)/200) computed from {{
model }}).
| - name: pct_of_cohort_revenue | ||
| description: > | ||
| Each product's share of the Excellent cohort's total revenue, expressed as a | ||
| percentage (0–100). Calculated as product_total_revenue / cohort_total_revenue * 100, | ||
| rounded to 2 decimal places. Useful for identifying which Excellent products | ||
| are the largest revenue contributors within the cohort. | ||
| data_type: numeric |
There was a problem hiding this comment.
pct_of_cohort_revenue is missing a not_null test.
cohort_total_revenue directly above it has - not_null, but pct_of_cohort_revenue does not. The SQL computes it as product_total_revenue / nullif(sum(...) over (), 0) * 100, which produces NULL whenever the cohort revenue sum is zero. A not_null test here surfaces that edge case at test time rather than silently exposing NULLs to consumers.
🛡️ Proposed addition
- name: pct_of_cohort_revenue
description: >
Each product's share of the Excellent cohort's total revenue, expressed as a
percentage (0–100). Calculated as product_total_revenue / cohort_total_revenue * 100,
rounded to 2 decimal places. Useful for identifying which Excellent products
are the largest revenue contributors within the cohort.
data_type: numeric
+ tests:
+ - not_null📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: pct_of_cohort_revenue | |
| description: > | |
| Each product's share of the Excellent cohort's total revenue, expressed as a | |
| percentage (0–100). Calculated as product_total_revenue / cohort_total_revenue * 100, | |
| rounded to 2 decimal places. Useful for identifying which Excellent products | |
| are the largest revenue contributors within the cohort. | |
| data_type: numeric | |
| - name: pct_of_cohort_revenue | |
| description: > | |
| Each product's share of the Excellent cohort's total revenue, expressed as a | |
| percentage (0–100). Calculated as product_total_revenue / cohort_total_revenue * 100, | |
| rounded to 2 decimal places. Useful for identifying which Excellent products | |
| are the largest revenue contributors within the cohort. | |
| data_type: numeric | |
| tests: | |
| - not_null |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbt/models/marts/finance/schema.yml` around lines 250 - 256, The
pct_of_cohort_revenue field in the schema.yml is missing a not_null test which
allows NULLs when cohort_total_revenue is zero; add the - not_null test under
the pct_of_cohort_revenue column entry so dbt will fail the build if the metric
is NULL (this mirrors the existing not_null on cohort_total_revenue and ensures
the calculated column pct_of_cohort_revenue is validated).
Test PR from data portal
Made with Cursor
Summary by CodeRabbit
Release Notes